Case Study: Reddit Social Network Analysis Against Influence Operation

Adel Abu Hashim & Mahmoud Nagy - August 2021

Table of Contents

Introduction

This case study aims to help Amber Heard

By analyzing new accounts posting/ commenting against a victim of a Social Bot Disinformation/Influence Operation.

We have three main datasets:
(The datasets screaped from reddit).

  • 1- A dataset with submissions & comments data (2021).
  • 2- Users Data (from 2006 to 2021).
  • 3- A merged dataset (submissions & comments data, users data).
  • 4- Daily creation data (# of accounts created per day from 2006 to 2021)

Exploratory Data Analysis

  • Peak Days
  • Reddit Comments/Submission Data

  • Peak Days
  • Reddit Comments

  • Peak Days
  • Reddit Submissions

    The number of negative parent comments on submissions

    This means that we have about 850 negative first comments on submissions (not replies).

    1- The # of submissions VS the # of comments

    2- NLTK vs BLOB clssification

    Common results

    https://stackoverflow.com/questions/27898830/python-how-to-change-autopct-text-color-to-be-white-in-a-pie-chart

    https://stackoverflow.com/questions/31517194/how-to-hide-one-specific-cell-input-or-output-in-ipython-notebook

    https://matplotlib.org/stable/tutorials/colors/colormaps.html

    we can see that negative class has near results in both models (in 2018)

    3- Investigate the text column

    Investigate The Most Negative key Words Used (from the wordcloud map)

    Lets first check for the users using the word 'f*ck'

    Jreal22

    used the word f*ck 10 times
    Negative Comments

    Check for the repeated text

    Negative class text

    5- Investigate the Negative Submissions Text

    7- Check the number of negative text words

    Of course few words are easier for bots to create

    8- Most commented user

    https://stackoverflow.com/questions/59100115/plotly-how-to-reverse-axes

    Further Investigate The Most Commented Users In 2021

    AutoModerator

    AutoModerator is a system built into reddit that allows moderators to define "rules" (consisting of checks and actions) to be automatically applied to posts in their subreddit.

    CelebBattleVoteBot

    This is a vote bot

    charliedba

    Positive Submissions

    Stanley_Elkind

    posting negative comments (related to sex).

    Truthbetheprejudice

    Negative Submissions

    gaul66

    voting in a positive way

    sadwook

    Negative Comments

    It's weird!!
    all this user contributions in 2021 (32) are in the same day 2021-05-11
    user created: 2018-12-29 the same date of 2018 peak!!

    Beatplayer

    positive Comments

    9- Invesigating authors with the most negative comments / submissions

    a lot of negative-comments authors have been banned from Reddit which is great!

    AutoModerator is a system built into reddit that allows moderators to define "rules" (consisting of checks and actions) to be automatically applied to posts in their subreddit.

    10- Invesigating authors with the most negative submissions

    Let us first Investigate authors with the most submissions

    11- Check wether the users contributing the most to negative comments/submissions are mod, gold or having a verified email

    13- Invesigating subreddits with the most negative comments

    Subreddits with the most negative comments are almost the same as the most used subreddits

    14- Submission URLS

    16- Invesigating submissions with the most negative comments

    look at number 4 and 5 as they are more accurate

    This here only shows the submissions with the most top_level comments

  • Peak Days
  • Merged Users Data with Comments & Submissions Data

    Difference in time between creating the account and posting (negative)

    Posting Duration After Account Creation

    we can find that 323 accounts created and posted within the same month in 2021.

    https://stackoverflow.com/questions/11520492/difference-between-del-remove-and-pop-on-lists/11520540

    looks like this account is not suspected, so we are going to remove from the suspected_list

    Estimation of Number of User Accounts Created in each year / having negative contributions in 2021

  • Peak Days
  • Peak Days

    Which dates has the highest negative contrbitions for users?

    NOTE: a lot of negative contributions on 17-4-2021!

    In Which days accounts created more negative comments

    https://pandas.pydata.org/docs/reference/api/pandas.cut.html

    Conclusions